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AN ALGORITHM FOR AUTOMATING THE REGISTRATION OF 
USDA SEGMENT GROUND DATA TO LANOSAT MSS DATA 


I. SUMMARY 

An algorithm for automating the segment ground data shifting process 
used by U.S. Department of Agriculture (USDA)/Stat1st1ca1 Reporting Service 
(SRS) has been developed. The segment shifting process Is the procedure 
used In the registration of SRS segment ground data to Landsat MSS data. 

The algorithm Is referred to as the Automated Segment Matching Algorithm 
(ASMA). The Initial evaluation of this algorithm Indicates that It has 
good potential for replacing most of the manual segment shifting procedure 
presently used by the SRS. The algorithm will be tested In a Sacramento 
Valley, California, study during FY1982. 

II. INTRODUCTION 

The first part of the scene- to-map registration task under the AgRISTARS 
Domestic Crops and Land Cover (DCLC) project was to evaluate the registra- 
tion accuracy of the P-format Landsat data. This was done and reported In 
AgRISTARS Report DC-Yl-04069 (NSTL/ERL-197) . April 1981 (reference 1). 

The second part of this task was to develop an algorithm that would 
automate the process of segment shifting. This process, given an Initial 
or gross scene- to-map registration, translates the SRS segment outline 
plus or minus x columns of Landsat data and plus or minus y rows of 
Landsat data to locate a better fit of segment ground data to Landsat 
MSS data. 


III. SRS SEGMENT 

The SRS segment (reference Z) Is an area of land which has been randomly 
selecc d by USDA as a sample unit In some land use stratum. It Is usually 
divided into several ownership tracts, and further subdivided Into fields. 

Each field represents an area on the ground to be considered homogeneous with 
respect to ground cover. The boundaries of these areas do not overlap. There 
Is no restriction (other than computer file storage space limitations) on 
the complexity of the field boundaries. 

Trained enumerators visit each segment and record the crop or ground 
and size of the various fields. Segment boundaries and locations 
aie marked on aerial photos and USGS topographic quadrangle maps. 

segment digitization Is the process of converting segments from 
fields drawn on aerial photographs or topographic maps to a file of co* 
ortllnates in a geographic coordinate system. Location of points on the 
photos or maps are measured by hand using a data tablet digitizer. In con- 
junction with Interactive EDITOR software subsystems, and assembled Into a 
convenient computer- readable data structure. This data structure con- 
tains all the topological, geographic, and naming Information needed to 
completely reconstruct the segment. 

IV. SRS PROCEDURE FOR MANUAL SEGMENT SHIFTING 

The Initial registration of the segment to the Landsat data Is per- 
formed using a least-squares fitting procedure based on control points. 

Control points are features on both the map and tlw Landsat scene whose 
coordinate pairs are used to cMipute the transformation coefficients. 
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Once the registration has been completed, the segment can be plotted 
(using the EDITOR software) In the Landsat coordinate system. An example 
of such a plot Is shown In Figure 1. 

Corresponding areas of the Landsat data are also plotted using a 
raster plotter and showing each Landsat pixel as a pattern of dots. 

Examples of this type of plot are shown In Figures 2 and 3. These raster 
plots are produced to the same scale as the segment plot, and represent 
larger areas than the digitized secpent. This allows for the shifting 
of the segment to find a better local fit. The mm^rs at the top and 
bottom of the plots In Figures 1, 2, and 3 represent Landsat columns. The 
numbers at the sides represent Landsat rows. 

The process of obtaining the shift numbers Is begun by selecting 
either the plot of the channel 2 data (Figure 2) or the plot of the 
channel 4 data (Figure 3). The one chosen should best represent the 
patterns that appear In the plot of the segment (Figure 1). 

The plots are usually placed on a back-lighted table and the comers 
of the rectangle that circumscribe the segment (Figure 1) are aligned 
with the four X's In the raster plot (Figure 2 or Figure 3). For this 
particular segment, the X's are where rows 180 and 215 Intersect colunms 
780 and 814. The Initial registration Is achieved once the comers of the 
segment rectangle overlay the four X's on the raster plot of the Landsat. 

If the segment plot can be moved up or down, or to the right or left 
(or both) so that It better matches the pattern In the Landsat plot, then 
the shift numbers for the better fit are recorded in terms of - columns 
and - rows. For the se^nt In F1g*jre 1, the shift numbers were determined 
to be 1 row and 0 columns, using this manual shift method. 
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V. AUTOHATED SEGMENT NATCHTNG ALGORITHM (ASMA) 

A. Initial Registration 

As In the case of the manual shifting method, the automated algorithm 
requires an Initial registration as a starting point about which to search 
for a better fit. As presently coded, the algorithm can use either the 
registration that Is already available In the annotation record of a P- 
format CCT or the USOA/SRS Initial registration referred to earlier. 

The P- format registration Information Is given In terms of HOTINE 
tick marks (reference 1). The SRS Initial nkgistratlon Is usually based 
on a global cubic polynomial determined by control points chosen by SRS. 

B. Segment Reconstruction 

The segment description Information resulting from the digitizing 
process along with the Initial registration can be used to reconstruct the 
segment In the Landsat coordinate system. Ihe file used during the sepent 
digitizing process ccmtalns the digitizer plate coordinates and mapping 
coefficients that relate the plote coordinates to latitude and longitude. 

The initial registration can then be used to relate the latitude and 
longltiKle to Landsat rows and columns. 

The segment Is reconstructed at half Landsat row and coltmm Inter- 
vals by rounding the computed Landsat coordinates to the nearest half row 
and half column. Ttm segment Is reconstructed as an array wtere 1's repre- 
sent boundary points, O's represent points outside the segment, and USDA/ 

SRS assigned field numbers represent each field within the segment. For 
the technique used in this a1{H>r1thm, boundary points are defined as points 
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th«t touch ar\y lint coontcting fm varticts. 

For axaMpUt according to tha figure at 
right, v^, V 2 » and are typical 
vertices from the SRS segment file, 
which are converted to Landsat rows 
and columns and are computed to 
within 1/2 Landsat row and 1/2 
Landsat column, kny 1/4 Landsat 
pixel that touches the line 
connecting two vertices 1$ assigned 
a boundary valiw of 1. 

For the boundary matching part of the algorithm, the field numbers are 
Igrwred. The field numbers are used for the second stage test of within 
field dispersion described In Sect1«i VI .B, Sectwid Stage. 

An example of the segment boundaries reconstructed by this technique 
Is given In Figure 4. However, In practice, the algorithm does not write 
the reconstructed se^nent to a <tev1ce; the algorithm builds the segment 
as an array and holds It In memory during the shifting process. 

C. Preparation of Landsat Oata>Edge Enhancement 

The search window In the Landsat data Is determined by the 
Initial registration; that is, the segment vertices can be converted to 
Landsat row and column nimd)ers. From these, the maxlrntmi row and column 
and the minimum row and column for the segment can be determined. The 
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v3 


Qi/ 4 Landsat pixel 
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starch window In tht Landsat data It talctn to bt 10 rows and 10 coluams 
more than tach maxlnuei and 10 rows and 10 colusms lass than tach minimum. 

For tha givan starch window, as a 2 by 2 s1!d1ng window Is movad through 
tha data, the fol Imping gradient values based on the dIagrMi 
and equations below are computed. 
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These values are computed In bands 5 and 7 and added to determine the 
total gradient value for each point. The maxlmtin total value Is set tc 10 
because the Interit of the edge enhancement Is io use relative values to de- 
termine If a pixel Is different from Its neighbor. As will be explained 
later, the algorithm uses ttwse relative values by Sir^^nlng them when they 


6 



coincide with 1's (boundaries) In the reconstructed segment. It was de> 
termlned by earlier work that If gradient values are alloweu to exceed a 
saturation value, then tlw algorithm will fall. 

The Multi spectral Scanner of Landsat can sense big differences In 
radiometric values between certain fields, whereas the segment data 
(ground truth) show that the fields within a segment are different but 
does not Imply the <tegree of spectral difference. Therefore, If the 
goal of the algorithm Is to match the field patterns In the segment with 
patterns In the Landsat data, the edges between two fields that are 
spectrally different cannot be weighted more than the edges between 
other fields. 

I 

After each X^. (i = 1,2,3) is computed for a 2 by 2 window, the window 
Is moved one column to the right and the process Is repeated. After all 
colunms are covered for the two rows, the window Is moved down one row 
and the process is repeated for the next two rows. This process con- 
tinues until the larger search window has been covered. 

The output values of the sliding window occur at Landsat half rows 
and half colimms and the resulting output array Initially has 'holes' 
where no output values were determined. These holes are filled by 
averaging the eight (five at the file edges) neighboring values. 

Figure 5 shows an example of an output array resulting from the edge 
enhancement process. The array is actually held in memory during the 
shifting process and Is not written to a device. It is shown In Figure 
5 for exaiqole only. Because the output array represents half rows and 
half columns, the row and column nund>ers are double those of the original 
Landsat row and colimm. 
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0. Matching Process 

The reconstructed segment array, containing 1's for boundaries 
and O's elsewhere (field numbers are also taken to be 0 for this part of 
the algorithm) is referenced to the edge array (containing gradient values 
0-10) by the Initial registration. The edge array Is multiplied by the 
segment array and the results are summed. The segment array Is then 
shifted In half column and half row Increments ccMnputIng the sum for each 
shift. The flow of the algorithm Is Illustrated In Figure 6. 

Table 1 shows the standardized variables for these sums. The standard- 
ized variable is derived using the following equation; Sum - Mean 

S tancfard Devi a t i on 

The negative variables, i.e. , those sums that were less than the mean of 
the 441 sums, were printed as zero's because they represent minimum matching 
between the segment pattern and the Landsat data. The largest sum repre- 
sents the shift position with t1« most highly defined edges in terms of 
the segment boundaries. This shift position is taken as the best match 
in the preliminary (first) stage. 

The results shown in Table 1 are for segment 30S^ which was used as the 
example in the manual shifting procedure described earlier. The algorithm 
found the best match to be at a shift of -K).5 rows and 0 colunms. This 
compares to -fl row and 0 columns determined using the manual shifting 
procedure. 

VI. RESULTS 

A. First Stage 

The algorithm has been evaluated Initially by comparing its shift 
numbers for 30 segments with the shift numbers derived from the same 30 
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segments using the manual shifting technique (Table 2). Landsat 2 data 
from scene 21980*16264 (Kansas)were used for this evaluation. The seg- 
ments were digitized from 1:24,000 scale USGS quadrangle sheets. 

The manual shifting was performed by two different SRS personnel 
working independently. Both the algorithm and the SRS personnel used 
the SRS Initial registration as a starting point In t1« shifting process. 

Of the 30 segnents used for this evaluation, 20 were June Enumerative 
Survey (JES) se^nts (as described In Section I) and ten were Land Cover 
Survey (LCS) se^ents. Fields In the LCS segments were grouped more 
according to land use rather than plant species. For example, fields of 
com, hay, and soybeans within an LCS segment were all grouped into one 
field and called cropland, whereas, if these fields occurred In a JES 
segment they would be Individually labeled as corn, hay, arv) soybeans. 
Therefore, It was anticipated that the correlation between the spectral 
patterns recorded by Landsat and the field boundaries as described In 
LCS segments would be low. The results of this evaluation (Table 2) 
show this to be true. The results also show that LCS segments are more 
difficult to match using the manual shifting method as well. This is 
demonstrated by the larger discrepancies in the shift numbers for the 
LCS segments between person A and person B (Table 2). 

However, the information gained by attempting to match both JES 
segments and LCS se^nts will be Important In training the algorithm 
as to when it cannot match the Landsat with the segments based on bound- 
ary information alone or when the boundary match is questionable. 

The results of the automatic segment shifting process shown In Table 
2 (First Stage) seem promising when compared to the shift numbers deter- 


9 


mined by the manual shifting procedure. However, for future purposes, 
some method is required to determine the reliability of shift numbers, 
independent of comparing them with the manual shift numbers. The 
algorithm shift numbers in Table 2 (First Stage) are based solely on the 
maximwn sum (as described in Section VI ,B) within the shift window. 

A possible procedure for determining reliability could be based on 
the standardized values that are determined during the shifting process, 
as shown in the example In Table 1. Given that the Landsat data do . 
contain homogeneous patterns for the search window area, the greater 
the standardized variable, the more reliable the shift number. In the 
case of nonhomogeneous areas, the standardized variables would be low 
because no one pattern would stand out above the others. 

Thus, a cut-off point or threshold must be chosen to delineate reliable 
shift mmibers (based on the standardized variables) from questionable 
and unreliable ones. For this reason the shift information obtained 
from the attempts to match the LCS segments (Table 2) proved most helpful. 
Using the standardized variable for each maximum sum for each of the 30 
segments (given in Table 2), a distribution graph was constructed 
(Figure 7). The graph shows the standardized values of the 30 segments 
and indicates whether or not the segments were matched. A segment was 
considered matched if it was within - 1 column and - 1 row of either 
person A or person B in Table 2. 

Based on the distribution in Figure 7, it appears that if a standard- 
ized value is above 3.5 or 3.6, the shift numbers corresponding to that 
value are reliable. Therefore, the cut-off point was taken to be 3.6 
for follow-on Investigations. The graph in Figure 7 also illustrates 
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that there 1s a range of standardized values for a group of segments 
whose shift numbers are questionable. Eight of the segments within this 
range were matched and ten were not. The lowest standardized value within 
the matcted group was 2.5. A determination of the lower limit of this 
questionable group had to be made. It was decided that a segment whose 
corresponding standardized value was 2.0 or less would be flagged by 
the algorithm as Impossible to match by this procedure. By requiring 
a maximum sum that was more than 2 standard deviations above the mean of 
the 441 shift sums at least some pattern correlation between the segment 
and Landsat data was required before proceeding to a second*stage test 
of within-field dispersion. 

B. Second Stage 

Within-field dispersion Is being investigated as a possible test 
to use on the group of segments with questionable shift numbers. Within, 
field dispersion, as computed by this algoritNn, Is the sum of the vari- 
ances of all fields greater than 19 points (1/4 pixels) within a segment. 
The variances are computed on the result of the edge enhanced data de- 
scribed In Section V.C, Preparation of Landsat Data-Edge Enhancement. 

The variances for all fields are sunned to obtain the within-field 
dispersion (WFD) number. The WFO numbers are computed for all shifts 
within the search area for which the standardized value is greater than 
2.0. (As described earlier. If any standardized value Is 3.6 or greater, 
then the shift numbers are considered reliable and the WFD number is not 
computed). 

The WFD numbers are used In an attempt to match the se^nents In the 
group where the shift nunters are questionable and the segments do not 
match (Figure 7). For the 30 segments used In the evaluation, 10 segments 
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fall Into this category. However* because there are 18 segments with 
questionable shift numbers* the second stage test was run on all 18. 

The results of the second stage test of w1th1n-f1e1d dispersion are 
shown In Table 2. Of the ten segments within the questionable category 
that were not matched* 4 (117, 186* 8282* and 8384} ware matched using 
the second stage test of MFD. Of the 8 segments In the questionable 
category that were matched, the WFO test mismatched 2 (9294 and 421). 

In order for the algorithm to determine which segments were matched 
In the questionable set, the 12 reliable shift numbers were used to 
compute means and standard deviations for row and column shifts. The 
mean shift for the rows was 2.42 with a standard deviation of .85; the 
mean shift for the columns was -.71 with a standard deviation of 1.34. 
Therefore, any row shift In the range of .5 to 4.0 would be within 2 
standard deviations (rounded to the nearest half row) of the mean row 
shift and likewise, any column shift In the range of -3.5 to 2.0 would 
be within 2 standard deviations (rounded to the nearest half column) 
of the mean colunm shift. Using these criteria to determine which seg- 
ments were matched, the algorithm would have matched all the JES seg- 
ments except one (9294). If the two standard deviation criteria was 
applied before the second stage test of within field dispersion, then 
all JES segments would have been matched. 

The fact that many of the land cover sepmnts could not be matched 
by the algorithm Is not alarming. As stated earlier, many of these 
segments are based more on land use rather than plant species and re- 
present factors not necessarily sensed by the Landsat MSS. More Im- 
portant Is the fact that the al^rlthm should be able to determine 
which segments It cannot match. 
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VII. FOLLOW-ON TESTING 

For follow-on testing, the algorithm will use the proce'^ures and 
tests described In this report. That Is, given a set of segment data 
and the corresponding Landsat HSS data, the algorithm will determine 
the set of all shift numbers based on tlw corresponding standardized 
values. The set of reliable shift numbers (those whose corresponding 
standardized values are 3.6 or above) will be used to determine which 
shift numbers of the questionable set (those whose standardized values 
are between 2.0 and 3.6) are Incorrect. The criteria used will be the 
two-standard-deviation test described earlier. Any segment whose shift 
numbers are meaningless (those whose standardized values are less than 
2.0) will be flagged by the algorithm as Inpossible to match by this 
procedure. 

For those segments whose shift nund)ers are In the questionable 
gi*(Hip and those which failed the two-standard-deviation test, the 
within-fleld dispersion number test will be used to determine a new 
set of shift nianbers which must In turn pass the two-standard-<tev1at1on 
test. 

The final output of the algorithm will be the list of segments 
and the corresponding shift numbers for those that were matched. Also, 
the output will Indicate the stage of the algorithm In which each seg- 
ment was matched and Identify those segments which were not matched. 
This algorithm Is scheduled to be tested In tlw Sacramento Valley, 
California study during FY1982. 
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Figure 2 . Plot of Channel 2 (Band 5) Landsat MSS Data for Area Contaitr-m a 
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Figure 3. Plot uf Channel 4 (Band 7) Land.at M^.S Data for Area Containing Segment 
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Figure 4. Plot of Segment 305 Based on Algorithm 
Reconstruct^ ion 
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Figure 6. Flow chart for Segment matching A1gor1thm*f1rst stage test 
for matching patterns 
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Figure 7. Standardized values for the 30 Segments vs Se<mient Matching Results 
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Table 1. Segment 305 Shift Sums Expressed as Standardized Variables [(Sum-Mean) /Standard Deviation] Where Negative 
Variables are Printed as 0*s. 
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Table 1. Continued 
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